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EFFICIENT METHODS FOR SAMPLING SPIKE TRAINS IN 
NETWORKS OF COUPLED NEURONS^ 

By Yuriy Mishchenko and Liam Paninski 

Columbia University 

Monte Carlo approaches have recently been proposed to quantify 
connectivity in neuronal networks. The key problem is to sample from 
the conditional distribution of a single neuronal spike train, given the 
activity of the other neurons in the network. Dependencies between 
neurons are usually relatively weak; however, temporal dependencies 
within the spike train of a single neuron are typically strong. In this 
paper we develop several specialized Metropolis-Hastings samplers 
which take advantage of this dependency structure. These samplers 
are based on two ideas: (1) an adaptation of fast forward-backward 
algorithms from the theory of hidden Markov models to take advan- 
tage of the local dependencies inherent in spike trains, and (2) a first- 
order expansion of the conditional likelihood which allows for effi- 
cient exact sampling in the limit of weak coupling between neurons. 
We also demonstrate that these samplers can effectively incorporate 
side information, in particular, noisy fluorescence observations in the 
context of calcium-sensitive imaging experiments. We quantify the 
efficiency of these samplers in a variety of simulated experiments in 
which the network parameters are closely matched to data measured 
in real cortical networks, and also demonstrate the sampler applied 
to real calcium imaging data. 

1. Introduction. One of the central goals of neuroscience is to under- 
stand how the structure of neural circuits underlies the processing of in- 
formation in the brain, and in recent years a considerable effort has been 
focused on measuring neural connectivity empirically [Shepherd, Pologruto 
and Svoboda (2003); Bureau, Shepherd and Svoboda (2004); Briggman and 
Denk (2006); Hagmann et al. (2007); Sato et al. (2007); Smith (2007); Hag- 
mann et al. (2008); Luo, Callaway and Svoboda (2008); Bohland et al. 
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(2009); Helmstaedter, Briggman and Denk (2009)]. "Functional" approaches 
to this neural connectivity problem rely on statistical analysis of neural ac- 
tivity observed with experimental techniques such as multielectrode extra- 
cellular recording [Hatsopoulos et al. (1998); Harris et al. (2003); Stein et al. 
(2004); Paninski (2004); Truccolo et al. (2005); Santhanam et al. (2006); 
Luczak et al. (2007); Pillow et al. (2008)] or calcium imaging [Tsien (1989); 
Cossart, Aronov and Yuste (2003); Yuste et al. (2006); Ohki et al. (2005)]. 
Although functional approaches [Brillinger (1988); Nykamp (2003); Nykamp 
(2005a); Okatan, Wilson and Brown (2005); Srinivasan et al. (2006); Rigat, 
de Gunst and van Pelt (2006); Nykamp (2007); Yu et al. (2006); Kulkarni and 
Paninski (2007)] have not yet been demonstrated to directly yield the true 
physical structure of a neural circuit, the estimates obtained via this type 
of analysis play an increasingly important role in attempts to understand 
information processing in neural circuits [Bureau, Shepherd and Svoboda 
(2004); Sato et al. (2007); Pillow et al. (2008); Mishchenko, Vogelstein and 
Paninski (2011)]. 

Perhaps the biggest challenge for inferring neural connectivity from func- 
tional data — and indeed in network analysis more generally — is the pres- 
ence of hidden nodes which are not observed directly [Nykamp (2005b, 
2007); Kulkarni and Paninski (2007); Vidne et al. (2009); Pillow and Latham 
(2007); Vakorin, Krakovska and Mcintosh (2009)]. Despite swift progress in 
simultaneously recording activity in massive populations of neurons, it is still 
beyond the reach of current technology to monitor a complete set of neurons 
that provide the presynaptic inputs even for a single neuron [though see, e.g., 
Petreanu et al. (2009) for some recent progress in this direction]. Since es- 
timation of functional connectivity relies on the analysis of the inputs to 
target neurons in relation to their observed spiking activity, the inability to 
monitor all inputs can result in persistent errors in the connectivity esti- 
mation due to model misspecification. Developing a principled and robust 
approach for incorporating such unobserved neurons, whose spike trains are 
unknown and constitute hidden or latent variables, is an area of active re- 
search in connectivity analysis [Nykamp (2007); Pillow and Latham (2007); 
Vidne et al. (2009); Vakorin, Krakovska and Mcintosh (2009)]. 

Incorporating these latent variables into connectivity estimation is a chal- 
lenging task. The standard approach for estimating the connectivity requires 
us to compute the likelihood of the observed neural activity given an esti- 
mate of the underlying connectivity. Computing this likelihood, however, 
requires us to integrate out the probability distribution over the activity of 
all hidden neurons. This latent activity variable will typically have very high 
dimensionality, making any direct integration methods infeasible. 

Thus, it is natural to turn to Markov chain Monte Carlo (MCMC) ap- 
proaches here [Rigat, de Gunst and van Pelt (2006)]. The best design of such 



SAMPLING SPIKE TRAINS 



3 



an MCMC sampler is not at all obvious in this setting, since it may be nec- 
essary to develop sophisticated proposal densities to capture the dependence 
of the hidden spike trains on the observed spiking data in order to guarantee 
a reasonable proposal acceptance rate [Pillow and Latham (2007)]. For ex- 
ample, the simplest Gibbs sampling approaches may not perform well given 
the strong dependencies between adjacent spiking timebins that are inherent 
to neural activity. 

In this paper we develop Metropolis-Hastings (MH) algorithms for effi- 
ciently sampling from the probability distribution over the activity of hidden 
neurons. We derive a proposal density that is asymptotically correct in the 
limit of weak interneuronal couplings, and demonstrate the utility of this 
proposal density on simulated networks of neurons with neurophysiologically 
feasible parameters [Sayer, Friedlander and Redman (1990); Abeles (1991); 
Braitenberg and Schuz (1998); Gomez-Urquijo et al. (2000); Song et al. 

(2005) ; Lefort, Tomm and Sarria (2009)]. We also consider a hybrid MH 
strategy based on fast hidden Markov model (HMM) sampling approaches 
that can be exploited to extend the range of applicability of the basic al- 
gorithm to more strongly coupled networks of neurons. In each case, the 
resulting MCMC chain mixes quickly and each step of the chain may be 
computed quickly. 

Of special interest is the problem of sampling from the probability distri- 
bution over unknown spike trains when a neuron is indirectly observed using 
calcium imaging (instead of electrical recordings) [Tsien (1989); Yuste et al. 

(2006) ; Cossart, Aronov and Yuste (2003); Ohki et al. (2005); Vogelstein 
et al. (2009)]. This problem can be naturally related to the sampling prob- 
lem described above. We develop an approach to incorporate calcium fluo- 
rescence imaging observations directly into our proposal density. This allows 
us to efficiently sample from the probability distribution over the activity of 
multiple hidden neurons given calcium fluorescence observations, improving 
on the methods introduced in Mishchenko, Vogelstein and Paninski (2011). 

2. Methods. 

2.1. Model definition. We model the activity of individual neurons with 
a discrete-time generalized linear model (GLM) [Brillinger (1988); Chornoboy, 
Schramm and Karr (1988); Brillinger (1992); Plesser and Gerstner (2000); 
Paninski et al. (2004); Paninski (2004); Rigat, de Gunst and van Pelt (2006); 
Truccolo et al. (2005); Nykamp (2007); Kulkarni and Paninski (2007); Pillow 
et al. (2008); Vidne et al. (2009); Stevenson et al. (2009)]: 

ni(t)~Bernouni[/(Ji(t))A], 

(1) 

TV 

Ji{t) = bi{t) + Y,Yl - 
j=it'<t 
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where the spike indicator function for neuron i, ni{t) € {0, 1}, is defined on 
time bins t of size A. Connectivity between neurons is described via the 
"connectivity matrix," Wij{t)] the self-terms Wii{t — t') describe refractory 
effects (there is a minimal "refractory" interspike interval of a millisecond 
or two in most neurons, e.g.), and Wij{t — t') represents the statistical effect 
of a spike in neuron j at time t' upon the spiking rate of neuron i at time t. 
N is the number of neurons in the neural population, including hidden and 
observed neurons, and bi{t) denotes a baseline driving term (which might 
depend on some observed covariates such as a sensory stimulus). We assume 
that the maximal firing frequency is much larger than the typical spik- 
ing rate, 3> E[ni{t)], in which case the Bernoulli spiking model in (1) is 
closely related to models in which nj(i) is drawn from a Poisson distribution 
at every time step (see the references above for a number of examples). 

In the presence of fluorescence observations from a calcium-sensitive in- 
dicator dye (the calcium imaging setting), the model (1) is supplemented 
by two variables per neuron i: the intracellular calcium concentration Ci{t) 
(which is not observed directly) and the fluorescence observations Fi(t). 
We model these terms according to a hidden Markov model governed by 
a simple driven autoregressive process [Vogelstein et al. (2009); Mishchenko, 
Vogelstein and Paninski (2011)], 



Thus, under nonspiking conditions, Ci{t) is set to the baseline level of Cf. 
Whenever the neuron fires a spike, nj(t) = 1, the calcium variable Ci{t) 
jumps by a fixed amount Ai, and subsequently decays with time constant r?; 
rf is on the order of hundreds of milliseconds in the cases of most interest 
here. The fiuorescence signal Fi{t) corresponds to the count of photons col- 
lected at the detector per neuron per imaging frame. This photon count may 
be modeled with approximately Gaussian statistics (Poisson photon count 
models are also tractable in this context [Vogelstein et al. (2009)], though 
we will not pursue this detail here), with the mean given by a saturating 
Hill-type function S{C) = C/{C + Kd) Yasuda et al. (2004) and the vari- 
ance V{C) scaling with the mean. See Vogelstein et al. (2009) for full details 
and further discussion. Note that observations of calcium-indicator fluores- 
cence are typically performed at a low "frame-rate," FR, measured in frames 
per second. Thus, we will restrict our attention to the case FR < A~^, that 
is, Fi{t) observations are available only every few timesteps A. In addition, 
it will be useful to define an effective SNR, following Mishchenko, Vogelstein 
and Paninski (2011), as 



a{t) = a{t - A) - -ia{t - a) - 



') + Aini{t) 



(2) 



i 



F,{t)r^Ar[s{a{t)),v{am. 



(3) 



eSNR 



E[F^{t)-Fi{t-A)\ni{t) = l] 



^[(F,(t) - F,(t - A))V2|n,(t) = 0]i/2 ' 



SAMPLING SPIKE TRAINS 



5 



that is, the size of a spike-driven fluorescence jump divided by a rough 
measure of the standard deviation of the basehne fluorescence. 

In Mishchenko, Vogelstein and Paninski (2011) we introduced an expecta- 
tion-maximization approach for estimating the model parameters given cal- 
cium fluorescence data; Pillow and Latham (2007) discuss a related ap- 
proach for fitting the model given spiking data recorded from extracellular 
electrodes. The maximization step in this context is often quite tractable, in- 
volving a separable convex optimization problem [Smith and Brown (2003); 
Paninski (2004); Kulkarni and Paninski (2007); Escola et al. (2011)]. The 
expectation step is much more difficult; analytical approaches are often in- 
feasible. Monte Carlo solutions to this problem require us to obtain sam- 
ples from the probability distribution over the activity of all hidden neu- 
rons. This problem is the focus of the present paper. In the context of 
the expectation maximization framework, we assume therefore that an esti- 
mate of the model parameters 6 is available, and the problem is to obtain 
a sample from P{nfiiMen\^observed'i^) (in the case that the spike trains of 
a subset of neurons is observed directly), or P{iihidden\^ observed] 

6) (in the 

case that spike trains are observed indirectly via calcium fluorescence mea- 
surements). Here and throughout we use bold notation for vectors, that is, 
n(t) = {ni{t),i = l,...,N} and n= {ni(t),i = l,...,iV;t G [0,r]}; 6 denotes 
the set of all parameters in the model, including b,w,r'^, and A. In cases 
where no confusion is possible below we will suppress the dependence on 6. 

2.2. A block-wise Gibbs approach for sampling from the distribution over 
activity of hidden neurons. In Mishchenko, Vogelstein and Paninski (2011), 
we noted that in order to sample from the desired joint distribution over 
the activity of all hidden neurons, -P(n7ij^^en |')i it is sufficient to be able 
to efficiently sample sequentially from the conditional distribution over one 
hidden neuron given all of the other hidden neurons, P(nj|nyj;-): if this is 
possible, then a sample from the full joint distribution, P{nfiiii(ign\'), can be 
obtained using a block- wise Gibbs algorithm. This blockwise Gibbs approach 
makes sense in this context because the connectivity weights Wij ,i^ j, are 
relatively weak in many neural contexts; for example, synaptic strengths 
between cortical neurons are typically fairly small (see the discussion in Sec- 
tion 2.7 below). However, dependencies between the spike indicator variables 
{ni{s),ni{t)} within a single neuron may be quite large if the time gap \s — t\ 
is small, since the self-terms wu are typically strong over small timescales; 
for example, as mentioned above, after every spike a neuron will enter a re- 
fractory state during which it is unable to spike again for some time period. 
See Toyoizumi, Rahnama Rad and Paninski (2009) for further discussion. 

Thus, we will focus below exclusively on the single-neuron conditional 
sampling problem, rij ~ P(nj|nyj; •). In the next couple sections we will dis- 
cuss methods for sampling from P(nj|n\j); in Section 2.6 we will discuss 
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Neuron 1 



Neuron 2 



Neuron 1 



Neuron 2 



Fig. 1. Graphical model describing (A) the K-order Markov model for neural dynamics 
in terms of the instantaneous spikes, Ui (t) , and (B) a simpler Markov model in terms of 
the grouped states Si{t) = {ni{t'),t ~ KA <t' <t}. 

adaptations of these methods for incorporating calcium fluorescence obser- 
vations in P(nj|nyj,Fj). 

2.3. Exact sampling via the hidden Markov model forward-backward pro- 
cedure. In some special cases we can solve the single-neuron conditional 
sampling problem quite explicitly. Specifically, if the support of each of the 
coupling terms Wij{t) is contained in the interval [OjJCA] for some suffi- 
ciently small number of time steps K, then we can employ standard hidden 
Markov model methods to sample from the desired conditional distribu- 
tion exactly. The key is to note that in this case equation (1) defines a 
Xth-order Markov process; thus, it is convenient to apply the standard sub- 
stitution Si{t) = {rij(t'),t — KA <t' < t}. Here Si{t) denotes the "state" of 
neuron i, which is completely characterized by the binary iT-string [e.g., 
Si{t) = "010011100. . . "] describing its spiking over the K most recent time 
bins. Clearly, the size of the state-space is 2^. See Figure 1 for an illustration. 

To exploit this Markov structure, note that we may write the conditional 
distributions P[si{t)\si{t - A),s\i{t - A)] and P[s\j(t + A)js\j(t), Si(t)] ex- 
plicitly using model (1), and, in fact, it is easy to see that the system 

S^{t) P[si{t)\s^{t - A),S\i{t - A)], 

(4) 

s\iit + A) ~ P[s\,it + A)|s\,(i), 

forms an autoregressive hidden Markov model [Rabiner (1989)], with s\j(t + 
A) playing the role of the observation at time t (Figure 1). Thus, if K 
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is not too large, standard finite hidden Markov model (HMM) techniques 
can be applied to obtain the desired spike train samples. Specifically, we 
can use the standard filter forward-sample backward algorithm [Rabiner 
(1989); De Gunst, Kiinsch and Schouten (2001)] to obtain samples from 
P(sj|s\j), from which we can easily transform back to obtain the desired 
sample rij. This requires 0{T2^) computation time and 0(2^) memory, 
since the transition matrix P[si{t)\si{t — A),syj(t — A)] is sparse [due to 
the redundant definition of the state variable Si{t)], so the matrix-vector 
multiplication required in the transition step scales linearly with the number 
of states, instead of quadratically, as would be seen in the general case. 
Note, for clarity, that in the above discussion we are assuming that the 
coupling terms Wij (t) are known (or have been estimated in a separate step) , 
and therefore K is also known, directly from the maximal support of the 
couplings Wij{t); thus, we do not apply any model selection or estimation 
techniques for choosing K here. 

2.4. Efficient sampling in the weak-coupling limit. Clearly, the HMM 
strategy described above will become ineffective if K becomes larger than 
about 10 or so, due to the exponential growth of the Markov state-space. 
Therefore, it is natural to seek more general alternate methods. Recall that 
the interneuronal coupling terms Wij,i j, are small. Thus, it seems promis- 
ing to examine the conditional spike train probability in the limit of weak 
coupling (wij — >■ 0), in the hope that a good sampling method might suggest 
itself. 

The log-likelihood of the hidden spike train rij can be written as 
logP(nj|nw) = log P(ni, nw) + const. 

(5) 

= 5Z / [^i (t )] - / [J^ it)] A- n^ it) ! + const . ; 

i t 

for notational simplicity, we have used a Poisson model for nj(t) given the 
past spike trains, but it turns out that we can use any exponential fam- 
ily with linear sufficient statistics in the following computations; see the 
Appendix for details. Now, if we make the abbreviation 

(6) Jr{t) = ^^Wijit- s)njis), 

s<t j 

we can easily expand to first order in Wij: 
logP(ni,n\i) 

= ^ni it) log / [Ji it)]-f[Mt)]A-niit)\ 
t 
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+ ^Y1 log + - fhit) + Ji{t)]A + const. 

= Y,Mt) log/[Ji(t)] - f[Ji{t)]A - mity. 
t 

+ - fhim) J~{t) + o{w) + const. 

= Y,r^i{t)\ogf[Ji{t)]-f[Ji{t)]/\-ni{t)\ 
t 

j^i t ^ J I j\ J\ ^ s<t k 

+ o{w) + const. 

(7) = E^^(*) log /['^'(*)] - f\J'MA - n,{t)\ 

t 

+ EEE(-.(-)7|^ - f%is)]Ay,,is - t)nm 
+ o{w) + const. 

= ^n.(t)|log/[J.(t)] + _ t)[n,{s) - /[6,(.)]A]| 

— nj(t)! + o{w) + const., 

where we have used the definition of J J' (t) in the third equality, rearranged 
sums (and changed variables) over t and s in the fourth equality, regathered 
terms in the final equality, and retained only terms involving a factor of ni(t) 
throughout. 

Thus, if we note the resemblance between equation (7) and the Poisson 
log-probability, we see that one attractive approach is to sample from a spike 
train proposal ni{t) with log-rate equal to the term within brackets, 

(8) log/[J.(t)] + ET. IWT^'^ds - t)[n,{s) - m{s)]A]; 

j^i s>t Jn{S)\ 

since we are conditioning on {nj{t)} (i.e., these terms are assumed fixed), 
and there are no nj(s) terms affecting the proposed rate of nj(t) for s > t, 
it is straightforward to sample ni{t) recursively forward according to this 
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rate, and then use Metropolis-Hastings to compute the required acceptance 
probabihty and obtain a sample from the desired distribution. 

We note that this approach is conceptually quite similar to that of Pillow 
and Latham (2007), who proposed sampling recursively from a process of 
the form 

ni(i)~Poisson[/[jf^(t)]A], 

(9) 

j S<t s>t 

Pillow and Latham (2007) discussed a somewhat computationally-intensive 
procedure in which the parameters and w^j^ are reoptimized iteratively 
via a maximum pseudolikelihood procedure. If we reverse the logic we used 
to obtain equation (7), it is clear that our approach simply uses an "effective 
input" 

Ji{t)=bi{t) + jr{t) 

(10) 

in place of jf^{t) above. Further, in the special case of an exponential non- 
linearity, /(J) = exp(J), all pre-factors in equation (10) cancel out, leading 
to an expression that resembles equation (9) quite closely: 

Jjjt) = bjjt) + y^^y^^Wij(t - s)nj{s) 

j s<t 
j^i s>t 

Thus, in the weak-coupling limit w ^ 0, and ignoring self-interaction 
terms wu, we see that the optimal "back-inputs" w[j^{t — s),s> t, from 
Pillow and Latham (2007) may be analytically identified as the time- and 
index-reversed original forward couplings, Wji{—t). 

2.5. Hybrid HMM-Metropolis-Hastings approaches. So far we have de- 
veloped two methods for sampling from P(nj|ny): the HMM method (Sec- 
tion 2.3) is exact but becomes inefficient when K is large, while the weak- 
coupling method (Section 2.4) becomes inefficient when the coupling terms w 
become large. What is needed is a hybrid approach that combines the 
strengths of these two methods. The key is that the cross-coupling terms Wij 
are typically fairly weak (as emphasized above), and the self-coupling 
terms Wii{t) are large only for a small number of time delays t. 
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To take advantage of this special structure, we begin by constructing 
a truncated HMM that retains all coupling terms Wij{t) for t up to some 
maximal delay tmax, where tmax is chosen to keep the size of the state- 
space acceptably small but large enough so that the coupling terms are 
captured to an acceptable degree. Then we include the discarded coupling 
terms (i.e., terms at lags longer than tmax) via the first-order approximation. 
More concretely, denote the conditional probability of {?^^(^)} under the 
truncated HMM as 



[recall the correspondence between the ^C-string state variable Si{t) and 
the binary spiking variable Note that this forms an inhomogeneous 

Markov chain in the grouped state variables Sj(t), and an inhomogeneous 
(imax/'^)-Markov chain in ni{t), as discussed in Section 2.3. Now we want 
to incorporate the discarded coupling terms up to the first order: we simply 
form the product 



We use this simple product form here in an analogy to the HMM case, 
in which we condition on observations by simply forming the (normalized) 
product of the prior distribution on state variables and the likelihood of the 
observations given the state variables; in this case, the first term plays the 
role of the prior over the state variables Si{t) and the second term corre- 
sponds to the pseudo-observations represented by the weak coupling terms 
incorporating the observed firing of the other neurons rij. Although, to be 
clear, this joint probability does not correspond rigorously to any small- 
parameter expansion of P(nj|nyj), we might expect that this hybrid may 
outperform alternative strategies such as using only the truncated HMM, 
given by equation (12), or only the weak-coupling correction, given by equa- 
tion (13), as we will see in the Results section below (see especially Figure 6). 
Since this product form for the joint probability is structurally equivalent 
to the conditional probability of an HMM in the state variables Si{t) — more 
precisely, the graphical model [Jordan (1999)] corresponding to the above 
expression is a chain in terms of Si{t) — we can employ the forward-backward 
procedure to sample from this proposal, and then compute the Metropolis- 
Hastings acceptance probability to obtain samples from the desired condi- 
tional P(nj|ny). 



(12) 



Y\_Phmm{Siit)\Si(t - A);n\j) 



t 



YiPhmmiSiitMsiit - A);S\i) 



(13) 
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Note that the MH acceptance probabihty will decrease as a function of the 
dimensionality T of the desired spike train and of the size of the discarded 
weights Wij{t),t > tmax; since for large w the proposal density discussed 
above will approximate the target density less accurately. In general, it is 
necessary to employ a blockwise approach: that is, we update the spike train 
{nj(t)} in blocks whose length is chosen to be small enough that the MH 
acceptance probability is sufficiently high, but large enough so that the total 
correlations between blocks are small and the overall chain mixes quickly. 
We will examine these trade-offs in more quantitative detail in the Results 
section below. 

2.6. Efficient sampling given calcium fluorescence imaging observations. 
In this section we turn our attention to the problem of sampling from the 
fluorescent-conditional distribution rij ~ P(nj|n\j,Fj). In principle, we could 
apply the same basic approach as before, exploiting the HMM structure of 
equations (l)-(2) in the variables {si{t),Ci{t)}; however, the state-space for 
the calcium variable Ci(t) is continuous (instead of discrete), requiring us 
to adapt our methods somewhat. 

We will briefly mention two alternative methods before introducing the 
novel approach that is the focus of this section. First, in Mishchenko, Vogel- 
stein and Paninski (2011) we introduced a sampler based on a technique from 
Neal, Beal and Roweis (2003). This method requires drawing a rather large 
auxiliary sample from the continuous Ci{t) state space and then employing 
a modified forward-backward approach to obtain spike train samples. The 
techniques we will discuss below do not require such an auxiliary sample, and 
are therefore significantly more computationally efficient; we will not com- 
pare these methods further here. Second, standard pointwise Gibbs sampling 
is fairly straightforward in this setting: we write P(Fj,n) = P(Fj|nj)P(n), 
and then note that both P(nj(t)|nyj f) and P(Fj|nj) can be computed easily 
as a function of Uiit). [To compute the latter quantity, note from equation (2) 
that ni{t) only affects the values of Fi[t') appreciably for t <t' < n^rf, for 
a suitably large number of time constants Uc-] We will discuss the perfor- 
mance of the Gibbs approach further below. 

Now we turn to the main theme of this section: to adapt the MH-based 
blockwise approach discussed above, it is essential to develop a proposal 
density that efficiently incorporates the observed fluorescence data, in order 
to ensure reasonable acceptance rates. We use a filter-backward-and-sample- 
forward approach. The basic idea is to compute, via a backward recursion, 
the conditional future observation density P{Fi{t:T)\si{t),Ci{t);n\i) given 
the current state {si{t),Ci{t)), for all t <T. Then, we may easily sample 
from a proposal of the form 

Si (t) ~ P{Fi {t:T)\ (t) , a (t) ; n\, ) X P(s, (t) I Si (t - A) ; n\,) , 

(14) 

Ciit) = ait - A) - A/rf (C,(t - A) - C^) + A^t), 
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and by appending the samples Si{t), <t <T, we obtain a sample from the 
desired density P(nj|n\j, Fj). Here the spiking term P{si(t)\si{t — A);n\^) 
may include weak-coupling terms or truncations to keep the state-space 
tractably bounded, as discussed in the previous section. [Again, recall the 
correspondence between the iT-string state variable Si{t) and the binary 
spiking variable ni{t).] 

To calculate P{Fi{t:T)\si{t),Ci{t)), we can use the standard backward 
HMM recursion Rabiner (1989), 

(15) P{F,{t : T)\si{t),Ci{t)) = P{Fi{t)\Ci{t))P{F,{t + A : T)\si{t),C,{t)), 
with 

P{Fi{t + A:T)\si{t),Ci{t)) 

(16) = [ dCi{t + A)P{Fiit + A:T)\siit + A),ait + A)) 

X P{s,{t + A)\s,{t);n\i)P{Ci{t + A)\s,it + A), C,(t)). 

We have already discussed the transition probability P{si{t + A)\si{t);n\i). 
The observation density P{Fi{t:T)\si{t),Ci{t)) is a continuous function 
of Ci{t). We could solve this backward recursion directly by breaking the Ci{t) 
axis into a large number of discrete intervals and employing standard nu- 
merical integration methods to compute the required integrals at each time 
step. However, this approach is computationally expensive in the high SNR 
regime [i.e., where the ratio of the spike-driven calcium bump Ai is large 
relative to the fluorescence noise scale y^V{Ci{t))], where a fine discretiza- 
tion becomes necessary. Vogelstein et al. (2009) introduced a more efficient 
approximate recursion for this density that we adapt here. The first step 
is to approximate P{Fi{t :T)\si{t),Ci{t)) with a mixture of Gaussians; this 
approximation is exact in the limit of linear and Gaussian fluourescence ob- 
servations [i.e., in the case that the S{-) is linear and V{-) is constant in 
equation (2)], and works reasonably in practice [see Vogelstein et al. (2009) 
for further details]. 

It is straightforward to see in this setting that we require 2^2^~^ mixture 
components to represent P{Fi{t:T)\si{t),Ci{t)) exactly; each term in the 
mixture corresponds to a distinct sequences of spikes ni(t:T) and initial 
conditions Si{t). For large T, such a mixture of course cannot be computed 
explicitly. However, as Vogelstein et al. (2009) pointed out, we may further 
approximate the intractable 2^2^~^ mixture with a smaller 2^ (T — t + 1) 
mixture, parametrized by the total number of spikes Ylit' =t''^ii^') instead of 
the full spike sequence nj(t:T) (since mixture components with the same 
total number of spikes overlap substantially, due to the long timescale rf of 
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the calcium decay); thus, we may avoid any catastrophic exponential growth 
in the complexity of the representation. 

In order to calculate and update this approximate mixture, we proceed as 
follows. At each timestep t we represent P{Fi{t : T)\si{t) , Ci{t)) as a mixture 
of 2^(T — t + 1) Gaussian components with weights Ps,k, means rus^k, and 
variances Vs^k, for 2^ distinct states s and k = 1, . . . ,T — t + 1. In order to 
update this mixture backward one timestep, from time t to time t — A, we 
first integrate over Cj(t + A) in equation (16). Since Ci{t) evolves determin- 
istically given ni{t), this reduces to updating the means and the variances 
in each Gaussian component, 

ms,k K,fc - A/rfC^ - Aini{t)]/[1 - A/rf], 

(17) Vs,k^Vs,k/{l-A/T^f, 
Ps,k^Ps,k/{l-^/Ti). 

Second, we perform the multiplication with the observation density P{Fi(t)\ 
Ci{t)) in equation (15). In the case that Fi(t) is linear and Gaussian in 
Ci{t), each such product is again Gaussian, and the means and the variances 
are updated as follows [Mc{t) and Vc{t) are the mean and the variance for 
P(F,(t)|Q(t)), resp.]: 

m,,fc ^ (Mc(t - A)vs,k + ms,kVc{t - A))/{vs,k + Vc{t - A)) 
Vs,k Vc{t - A)Vs,k/{Vs,k + Vc{t - A)) 

(18) p,^k ^ Ps,k X {2TT{vs,k + V,{t - A))y^'^ 



X exp 



+ 



M,{t - Af rn\k 
" 2T4(t - A) 2v,^k 
{Mc{t - A)vs,k + ms,kVc{t - A))2n 



2(u,,fc + Vc{t - A))Vc{t - A)v,^k 

More generally [i.e., in the case of a nonlinear S{-) or nonconstant V{-) in 
equation (2)], standard Gaussian approximations may be used to arrive at 
a similar update rule; again, see Vogelstein et al. (2009) for details. 

Third, we perform the summation over Sj(t + A) in equation (16), and 
reorganize the obtained mixture to reduce the number of components from 
2(T — t + 1) to (T — t + 2) for each Si{t). Equation (16) doubles each mixture 
component into two new Gaussians, corresponding to the two terms in the 
sum for ?ij(t + A) = 1 or nj(t + A) = 0. For brevity, we denote these terms 
as w+ = P{s+{t + A)\si{t)) and w" = P{s-(t + A)\si{t)), where s+(t + A) 
stands for the state Si{t + A) describing a spike at time t + A, and s~{t + A) 
denotes the absence of a spike at time t + A. We group all new components 
in pairs such that each pair corresponds to a given number of spikes on the 
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interval t : T. Each such pair of Gaussian components is then merged into 
a single Gaussian with equivalent weight, mean, and variance: 

Ps,k -^W^ + w', 
(19) rus^k {w~^ms+^k_i + w'nis-^k)/ {w^ + w~), 

Vs,k v^i, + {w+Vs+^k-i + W'Vs-^k)/ {w^ + 
and the v^f, term corresponds to the variance of the means, 

+ w (m^-^fcW -'^s,fc(t- A)) +it; ). 

See Figure 8 below for an illustration. 

2.7. Simulating populations of spiking neurons. To test the performance 
of different sampling algorithms, we simulated a population oi N = 50-800 
neurons, following the approach described in Mishchenko, Vogelstein and 
Paninski (2011). Briefly, we simulated a spontaneously active randomly con- 
nected neural network, with each neuron described by model equation (1), 
and connectivity and functional parameters of individual neurons chosen 
randomly from distributions based on the experimental data available for 
cortical networks in the literature [Sayer, Priedlander and Redman (1990); 
Braitenberg and Schuz (1998); Gomez-Urquijo et al. (2000); Lefort, Tomm 
and Sarria (2009)]. Networks consisted of 80% excitatory and 20% inhibitory 
neurons [Braitenberg and Schuz (1998); Gomez-Urquijo et al. (2000)]. Neu- 
rons were connected to each other in a sparse, spatially homogeneous man- 
ner: the probability that any two neurons i and j were connected (i.e., that 
either Wij or wji was nonzero) was 0.1 [Braitenberg and Schuz (1998); Lefort, 
Tomm and Sarria (2009)]. The scale of the connectivity weights Wij was 
matched to results from the cortical literature, as cited above, and the overall 
average firing rate of the networks was set to be about 5 Hz. The connectivity 
waveforms Wij{t) were modeled as exponential functions with time constant 
fixed for all neurons at 10 msec; for the self-coupling terms Wii{t), neurons 
strongly inhibited themselves over short time scales (an absolute refrac- 
tory effect of 2 ms) and weakly inhibited themselves with an exponentially- 
decaying weight over a timescale of 10 ms. Finally, we used an exponential 
nonlinearity, /(•) =exp(-), for simplicity. Again, see Mishchenko, Vogelstein 
and Paninski (2011) for full details and further discussion. 

3. Results. The efficiency of any Metropolis-Hastings sampler is deter- 
mined by the ease with which we can sample from the proposal density (and 
compute the acceptance probability) versus the degree to which the proposal 
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approximates the true target P{ni\n observed)- In this section we will numer- 
ically compare the efficiency of the proposal densities we have discussed 
above. In particular, we will examine the following proposal densities, listed 
in rough order of complexity: 

• Time-homogeneous Poisson process. 

• Point process with log-conditional intensity function given by the delayed 
input J~{t), equation (6). 

• Point process with log-rate determined by full effective input in weak- 
coupling approximation Jj(t), equation (10). 

• Hybrid truncated HMM proposal including weak-coupling terms [equa- 
tion (13)]. 

As a benchmark, we also compare these samplers against a simple pointwise 
Gibbs sampler, in which we draw from i-*(nj(i)|nj n^) sequentially over t. 

We begin by inspecting a simpler toy model of neural spiking. In this 
model both refractory and interneuronal coupling effects are assumed to 
be short, that is, on the time scale of w20 msec. This simplification allows 
us to characterize the neural state fully by X ~ 10 past time bins (A = 2 
ms in these simulations), with the state variable Si{t) = {ni{t'),t — KA < 
t' <t}. In this case, equation (1) describes a hidden Markov model with 
a state space that is small enough to sample from directly, using the forward- 
backward procedure detailed in Section 2.3. Thus, in this case we can obtain 
the probability distribution P{ni\n observed) explicitly, and compare different 
MH algorithms against a ground truth. 

In Figure 2 we inspect the true instantaneous posterior spiking rate ri(t) = 
P{ni{t)\nohserved) / ^ for the hidden neuron in this toy model, calculated 
exactly with the forward-backward procedure and estimated using different 
effective rates for a few of the MH proposals discussed above. We see that 
the true rate rj(t) varies widely around its mean value, implying that the 
simplest proposal density (the time-homogeneous Poisson process) will result 
in rather low acceptance rates, as indeed we will see below. Similarly, a naive 
approximation for ri{t) using only the delayed input J~{t) fails to capture 
much of the structure of rj(t). Incorporating both the past and the future 
spiking activity of the observed neurons via the weak-coupling input Ji{t) 
leads to a much more accurate approximation of the true rate (t) . 

Similar results are obtained when we apply the MH algorithm using these 
proposal densities (Figures 3 and 4). We use MH with M = 5,000 samples 
for each proposal density, with a burn-in period of 1,000 samples, to ob- 
tain both the autocorrelation functions (Figure 3) and estimates for the 
instantaneous spiking rate rj(t) (Figure 4). We observe that the homoge- 
neous Poisson proposal density leads to a very long autocorrelation scale, 
implying inefficient mixing, that is, long runs are necessary to obtain ac- 
curate estimates for quantities of interest such as rj(t). Indeed, we see in 
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Fig. 2. Quantifying the approximation accuracy of two spike train proposal densities. 
Blue trace: the true spiking rate ri{t) of one neuron conditioned on the spiking activ- 
ity of all other neurons in a population of N = 50 neurons; the rate is computed exactly 
via forward-backward procedure described in Section 2.3. A population of spontaneously 
spiking neurons with neurophysiologically feasible parameters ( Section 2. 7) was simulated 
for a total of 1 sec at a time resolution o/ A = 2 msec. Approximate spiking rates ob- 
tained using just the delayed input, J~ (t) [equation (6)], or full weak-coupling input, J{t) 
[equation (11)], are shown in red and green, respectively. (In particular, the delayed in- 
put J~ (t) and the full effective input Ji(t) are computed as described in Section 2.4, and 
then we approximate the rates using ri(t) = /[J; (t)] or f[Ji{t)].) While the proposal based 
on the delayed inputs (t) reproduces the true spiking rate somewhat poorly, the weak- 
coupling J(t) approximation is significantly more accurate. 
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Fig. 3. Autocorrelation functions for Gibbs and MH samplers. (The autocorrelation func- 
tion is averaged over all time bins in the spike tram.) Echoing the results of Figure 2, we see 
that the homogeneous ("uniform") Poisson MH algorithm mixes slowly; the sampler based 
on the full weak-coupling input Ji(t) mixes significantly more quickly than the sampler 
based on the delayed inputs J~ (t) . The standard Gibbs sampler performed about as well as 
the weak-coupling MH sampler. Simulation details are as in Figure 2, except a 10 sec spike 
train was simulated here to collect a sufficient amount of data to distinguish the different 
algorithms. 
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Fig. 4. Comparing the accuracy of MH and Gibbs samplers. Each trace indicates the 
posterior spiking rate ri{t) for one hidden neuron, estimated from 5,000 samples using 
the Gibbs sampler and the three MH samplers compared in the preceding two figures. The 
results are similar: the homogeneous Poisson proposal performs badly [in the sense that 
the ri(t) computed based on these 5,000 samples approximates the true ri{t) poorly], while 
the weak-coupling and Gibbs samplers outperform the sampler based on the delayed in- 
puts J~{t) (in terms of variance around the true rate, computed via the full forward-back- 
ward HMM method, shown in blue). MH acceptance rates, R, varied from R ~ 0.75 for MH 
using the homogeneous Poisson proposal to R~ 0.98 for MH using the proposal with full 
effective input Ji {t) . Simulation details are as in Figure 2. A shorter time interval of the 
simulated spike train is shown for greater clarity here. 

Figure 4 that 5,000 samples are insufficient to accurately reconstruct the 
desired rate rj(t). Similarly, for the proposal density based on the delayed 
inputs J~{t), the autocorrelation scale is shorter, but still in the range of 
10-20 samples. The weak-coupling proposal, which incorporates information 
from both past and future observed spiking activity, mixes quite well, with 
an autocorrelation scale on the order of a single sample, and leads to an 
accurate reconstruction of the true rate in Figure 4. Interestingly, the sim- 
plest Gibbs sampling algorithm also performs well, with an autocorrelation 
length similar to that of the best MH sampler shown here. 

We also study the performance of the weak-coupling proposal as a function 
of the strength of the interneuronal interactions in the network, and as 
a function of the number of neurons N in the population and the length T 
of the desired spike train (Figure 5). The acceptance rate falls at a rate 
approximately inverse to the coupling strength, which seems sensible, since 
this proposal is based on the approximation that the coupling strength is 
weak. We observe similar behavior with respect to the size of the neural 
population. In particular, the acceptance rate drops below i? ~ 0.1 when 
N r-.^ 600-1,000. On the other hand, increasing the spike train length T affects 
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Fig. 5. Acceptance rate of the weak- coupling MH algorithm as a function of coupling 
strength, C , neural population size, N , and spike length, T. We simulated sparsely con- 
nected networks of inhibitory and excitatory neurons as described in Section 2.7, with 
refractory effects up to 10 msec long and interneural interactions up to 50 msec long. Here 
a coupling strength value of C = 1 corresponds to the neurobiologically motivated set of 
parameters in Section 2.7 ( also in Figures 2-4 ); other values of C corresponded to scaling 
f{J{t)) f{C ■ J{t)) in equation (1). 10 sec of neural activity was simulated at a time 
resolution of A — 2 msec, with the neural population spiking at ~^-5 Hz. 64 trials of 500 
samples were simulated, with a new random neural network generated in each trial, from 
which we estimated P(ni|n\i;w). Average acceptance rate R and standard deviation are 
shown for such trials, as well as examples of several individual trials ("x"). MH algorithm 
performance degraded significantly as C or N increased. On the other hand, for larger 
values of T performance degraded much less substantially, and even for the largest T we 
examined ('160 sec) the acceptance rate remained above i?~0.4. 

performance to only a moderate degree: even when the length of the sampled 
spike train is increased from 10 sec to 160 sec, the acceptance rate remains 
above R ~ 0.4. 

One would expect that the performance of the MH algorithm in the case 
of more strongly-coupled neural networks can be improved by including 
strong short-term interaction and refractory effects explicitly into the pro- 
posal density. We discussed such an approach in Section 2.5. For weakly 
coupled networks, we expect the performance of this hybrid algorithm to be 
similar to that of our original proposal density; however, for more strongly 
coupled neural networks, we expect a better performance. This expectation 
is borne out in the simulations shown in Figure 6: the hybrid sampler (solid 
black line) performs significantly better than the original MH algorithm 
(dashed black line) and uniformly better than alternative hybrid strategies, 
for example, where only the short-scale truncated HMM is used, or where 
all interneural interactions are accounted for via the weak-coupling approx- 
imation, while the self-terms wa are accounted for via a truncated HMM 
(gray lines) . Similar results are obtained when we vary the size of the neural 
population N (Figure 6). 
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Fig. 6. Acceptance rate of hybrid HMM-MH algorithm [equation (13)] as a function 
of the overall coupling strength C and neural population size N. The hybrid algorithm 
(solid black line) performs substantially better than the original MH algorithm described 
in Section 2.4 (dashed black line), and uniformly better than alternative hybrid strategies, 
for example, where only a short-scale truncated HMM is used or where all interneural 
interactions are accounted for via weak-coupling approximation (gray lines). Simulations 
are as in Figure 5. 

In the simulation settings described above (timestep A = 2 msec, with 
couphng currents wij lasting up to 50 ms) the MH algorithm (coded in Mat- 
lab without any particular algorithmic optimization) took ~300-400 sec to 
produce 1,000 samples of 1,000 time-ticks each on a PC laptop (Intel Core 
Duo 2 GHz), dominated by the time necessary to produce 1,000 spike train 
proposals using the forward recursion. The hybrid algorithm had a simi- 
lar computational complexity (tmax = 10 ms in the truncated HMM). The 
Gibbs algorithm had a somewhat higher computational cost, due largely to 
the fact that updates of the currents Ji{t) for up to 50 ms needed to be 
performed during each step, to decide whether to flip the state of each spike 
variable nj(t). This resulted in running times for the Gibbs sampler which 
were 5-20 times slower than for the MH sampler. The Gibbs updates are 
even slower in the calcium imaging setting (discussed at more length in the 
next section), due to the fact that each update requires us to update Ci{t) 
over ricTf timesteps (recall Section 2.6) for each proposed flip of nj(t). 

3.1. Incorporating calcium fluorescence imaging observations. Next we 
examine the performance of the method we developed in Section 2.6 for sam- 
pling from P(nj|n\j,Fj). We find that this sampler performs quite well in 
moderate and high SNR settings. More precisely, for values of eSNR greater 
than ~5 [recall the definition of eSNR in equation (3)], the conditional distri- 
bution P(nj|Fj) is localized near the true spike train quite effectively, leading 
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Fig. 7. Example of observed fluorescence for low SNR ~ 2 and relatively high SNR ~ 5. 
A = 2 msec, and calcium imaging frame-rate FR — 50 Hz. 10 sec spike trains were sim- 
ulated, with the neural population spiking at ~^-5 Hz. Actual spikes of the target neuron 
are indicated with stars. 



to a high MH acceptance rate (Figures 7-9). Indeed, we find [as in Vogel- 
stein et al. (2009)] that it is possible to achieve a sort of "super-resolution" 
in the sense that the MH algorithm can successfully return P(nj|F.j) on the 
time-scale of A = 2 msec even when fluorescence observations are obtained 
at a much lower frame-rate (here FR = 50 Hz). 

Conversely, for low values of eSNR, for example, eSNR ~ 1, the backward 
density P(Fi{t :T)\ni{t),Ci{t)) becomes noninformative [i.e., relatively flat 
as a function of ni{t) and Ci{t)], and the acceptance rate of our MH algo- 
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Fig. 8. Example of P{¥i{t:T)\C'i{t)) calculated as a T-mixture of Gaussians, as a func- 
tion of time (x-axis) and Ci{t) (y-axis); colorbar indicates the probability density at time t. 
For reference, true calcium concentration is shown in red, and "observed" calcium concen- 
tration at the imaging frames [i.e., {Fi{t))] is shown with green "x." Simulation details 
are as in Figure 1, first second (50 frames) is shown for clarity. Time ticks correspond to 
A = 2 msec. Actual spike times of the target neuron are shown with red lines. 
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Fig. 9. Posterior spike probability, P{ni(t)\n\i,Fi) , estimated using MH algorithm. 
While the MH algorithm performs well for SNR = 5 and above (R^ 0.8 ), for low SNR = 2 
acceptance rate is only 7? ~ 0.01. Simulation details are as in Figure 7, first second (50 
frames) is shown for clarity. Time ticks correspond to A = 2 msec. Actual spikes of the 
target neuron are shown with asterisks. Note that in the high-SNR case the sampler suc- 
cessfully recovers the smoothly-varying P{ni{t)\n\i, Fi) on a finer time-scale (A~2 msec) 
than the original fluorescence imaging data provided (Afr ^ 20 msec). However, in the 
low-SNR case the recovered firing rate is overly spiky and variable due to the slow mixing 
speed of the M-H chain; recall that similar behavior is visible in Figure 4- 



rithm reverts to the rate obtained under the conditions of no calcium imaging 
data. However, in the low-to- moderate SNR regime (e.g., eSNR ~ 2), the 
performance of the MH sampler can drop substantially. This is primarily 
due to deviations in the shape of P{Fi{t)\Ci{t)) from Gaussian at low SNR. 
Recall that we made several approximations in computing the backward 
density: first, this density is truly a mixture of ~2-^^* components, whereas 
we approximate it with a mixture of only ~T — t components. Second, we 
assume that each mixture component is Gaussian. Although the fluores- 
cence Fi{t) is described by normal statistics given the calcium variable Ci{t), 
the relationship between the fluorescence mean and calcium transient can 
be nonlinear, and the variance may depend on Ci{t) [recall equation (2)]. 
This makes the conditional distribution of Cj(t) non-Gaussian in general, 
particularly at low-to-moderate levels of SNR (where the likelihood term is 
informative but not sufficiently sharp to justify a simple Laplace approxi- 
mation). We tested the impact of each of these approximations individually 
by constructing a set of toy models where these different approximations 
were made exact; we found that the non-Gaussianity of P{Ci{t)\Fi(t)), due 
to nonlinear dependence of Fi{t) on Ci{t), was the primary factor respon- 
sible for the drop in performance. Thus, we expect that in cases where the 
saturating function S{-) is close to linear, the sampler should perform well 
across the full SNR range (Figure 10); in highly nonlinear settings, more so- 
phisticated approximations [based on numerical integration techniques such 
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Fig. 10. Acceptance rate of MH algorithms as a function of the calcium imaging ef- 
fective SNR, equation (3). Calculations for two calcium signal models, with a toy linear 
(gray dashed) and realistic Hill (solid black) transfer function, S{Ci(t)), are shown. [See 
Vogelstein et al. (2009); Mishchenko, Vogelstein and Paninski (2011) for further details 
on the precise form of the nonlinear function S{Ci{t)); in this case, three or four spikes 
were sufficient to drive the calcium variable into a regime where the fluorescence signal 
was significantly saturated.] Performance of the MH algorithm is good both for high and 
low eSNR, but can suffer for intermediate eSNR ~ 1.1-5. This performance drop is pri- 
marily due to deviations of the conditional distribution P{Ci{t)\Fi{t)) from the Gaussian 
shape, as exemplified by the much better performance m the model where S{-) is linear and 
P{Ci(t)\Fi(t)) is thus Gaussian. Simulation details are as m Figure 7. For each eSNR 50 
simulations for different neural populations were performed, and the average and standard 
error of these simulations are shown. 



as expectation propagation Minka (2001)] may be necessary, as discussed 
in the Conclusion section below. The MH sampler here took about twice 
as long as in the fully-observed spike train case discussed in the previous 
section, largely due to the increased complexity of the backward recursion 
described in Section 2.6. 

Given the good performance of Gibbs sampling noted in the calcium-free 
setting discussed above, we also examined the Gibbs approach here. How- 
ever, we found that the Gibbs algorithm was not able to procure the samples 
successfully given calcium imaging observations. In many cases we found 
that the Gibbs sampler converged rapidly to a particular spike train close to 
the truth, but would then become "stuck." If initialized again, the sampler 
would often converge to a different spike train, close to the truth, only to 
become stuck again. This behavior is due to the fact that the conditional 
distributions P(nj(t)|nj y,Fj) can be quite sharply concentrated, leading to 
poor mixing between spike train configurations with high posterior proba- 
bility; of course, this is a common problem with the Gibbs sampler (and 
indeed, this well-understood poor mixing behavior of the standard Gibbs 
chain is what led us to develop the more involved methods presented here 
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Fig. 11. Application to real data (sample data courtesy of T. Sippy, R. Yuste, J. Vogel- 
stein). Top: observed F{t) (black) and true spike times (red) were recorded from a single 
neuron via simultaneous fluorescence imaging and intracellular patch-clamp electrophysio- 
logical recording; see Vogelstein et al. (2009) for further experimental details. Blue trace in- 
dicates posterior P{n{t)\F) computed by the hybrid sampler. Bottom: p{C{t)\'F) computed 
by the hybrid sampler. Y-axis units are arbitrary in this case and have been suppressed. 
Note that the sampler infers spikes and jumps m C(t) at the correct times. 



in the first place). Roughly speaking, the extra constraints imposed by the 
fluorescence observations in P(nj(t)|nj \j,Fj) relative to P(nj(t)|nj make 
the former distribution more "frustrated," in physics language, making it 
harder for the Gibbs sampler to reach nearby states with high posterior 
probability and leading to the relatively slower Gibbs mixing rate in the 
calcium-imaging setting. 

Finally, we applied the hybrid sampler to a sample of real calcium fluores- 
cence imaging data (Figure 11), in which F{t) and the true spike times n{t) 
were recorded simultaneously. Thus, we have access to ground truth for n[t) 
here, though of course only the observed fluorescence F is used to infer 
p{n{t)\F). The model parameters were estimated using the EM method dis- 
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cussed in Vogelstein et al. (2009); again, only the observed F was used to 
infer the model parameters, not the true spike times n{t). About 50 spikes' 
worth (5,000 fluorescence frames) of data was sufficient to adequately con- 
strain the parameters. Then we applied the hybrid sampler using these pa- 
rameters to the subset of data shown in Figure 11. The sampler does a good 
job of recovering the spike rate n(t) from the observed fluorescence data F, 
and seems to do a reasonable job of recovering the corresponding jumps in 
p{C{t)\F) that occur at spike times. Note that we do not have access to the 
true intracellular calcium concentration C(t), and therefore no ground truth 
comparisons are possible for this variable. 

4. Conclusion. In this work we developed several Metropolis-Hastings 
approaches for sampling from the conditional distribution of neuronal spike 
trains, given either the activity of other neurons in the network or calcium- 
sensitive imaging observations. The most effective approach was the hybrid 
method described in Section 2.5, which takes advantage of the fact that 
strong short-term temporal dependencies within a single spike train may 
be handled via forward-backward hidden Markov model methods, while 
weaker long-term dependencies between neurons may be handled with the 
weak-coupling expansion developed in Section 2.4. In each case, to sample 
efficiently from the spike train at time t, it is important to incorporate not 
only past but also future information (i.e., spiking observations from times 
both before and after t); Pillow and Latham (2007) made a similar point. In 
the appendix we show that these methods may be extended rather easily to 
other exponential families (not just the Bernoulli and Poisson cases of most 
interest in the neuroscience setting); further applications to weakly-coupled 
Markov chains in nonneural settings seem worth exploring. 

Two major avenues are open for future work. First, as noted in Figure 8, 
the proposed sampler suffers somewhat in the case of strongly nonlinear fluo- 
rescence observations, largely because in this case our mixture-of-Gaussians 
approximation of the backward density P[Fi{t:T)\Ci{t)] can break down. 
More sophisticated methods for approximating this density are available, 
and should be explored more thoroughly. Second, as discussed in Vogel- 
stein et al. (2010), applications of these methods to real data are ongo- 
ing, via Monte Carlo-Expectation-Maximization methods similar to those 
discussed in Vogelstein et al. (2009); Mishchenko, Vogelstein and Paninski 
(2011), with the fast sampler introduced here replacing the slower Monte 
Carlo approaches discussed in Mishchenko, Vogelstein and Paninski (2011). 
Calcium-fluorescence imaging methods have exploded in popularity over the 
last several years, and we hope the methods presented here will prove use- 
ful in quantifying the cross-correlations and effective connectivity in neural 
populations observed via fluorescence imaging and multielectrode recording 
methods. 
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APPENDIX: SAMPLING FROM A WEAKLY-COUPLED 
EXPONENTIAL FAMILY 

As mentioned in Section 2.4, it is straightforward to develop a first-order 
proposal density in the weak-coupling limit more generally, in the case that 
the variables of interest are drawn from an exponential family distribution 
with linear sufficient statistics. We begin by writing down our exponential 
family model, using slightly more compact notation than in Section 2.4: 

(21) logp(Kt}) = f{Ju)k{nit) + g{Jit) + h{nit), 

it 

with the coupling introduced via 

(22) Jit = bu+Y^ w'Juj^t-s- 

s>0,j 

As before, we may easily expand the joint log-density to the first order: 

(23) logp{{nit}) = f{Jit)Knit) + g{J^t) + Hnu) 

t 

+ E if'(bjt)k{n,t) + g'{b,t)] Y <^^,t-s 

t,i^j s>0 

(24) 

-I- const (nit) + o{w). 

Now if we introduce the assumption that the sufficient statistic is linear, 
that is, k{n) = n, then after rearranging the double sum over s and t we 
obtain 

t L s>0,ij^j 

+ h{nit) + const(nj() -|- o{w), 

where again we have suppressed terms [such as g{Jit)] which do not invol- 
ve Hit; thus, to first order, the conditional distribution of {nu} given {rijt}, 
i^j, remains within the same exponential family, but with a parameter 
shift 

(25) /( J,,) ^ /( J,,) + Y <[/'(WKt+^ + 5'(W)]- 

s>0,i=/=j 

In the canonical parameterization, /(J) = J, standard exponential family 
theory [Casella and Berger (2001)] shows that g'{bj^t+s) = —E{nj^t+s\bj,t+s), 
and the parameter shift simplifies to 

(26) Jit ^ Jit + Y '^'J\.'^j,t+^~ ^("^j^t+slbj^t+s)]- 

s>0,i^j 
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See Beck et al. (2007) for a discussion of some related results. 
As a concrete example, consider the Gaussian case: 

(27) logp{{nit}) = - '^:^init - Jitf + const., 

it 

with Jit as above. In this case we may define fi{Jit) = Jn/f^f and gi{Jit) = 
— j'lt/2af] thus, we find that the parameter shift in this case is 

J J' vJ''^ 

(28) f{Jit) = -^^^+ ^ -^[uj^t+s - bj,t+s]- 

* * s>0,ij^j i 

In this linear-Gaussian case, we may compute the exact conditional dis- 
tribution of fij via the usual Gaussian conditioning formula; the necessary 
covariance and inverse covariance matrices may be obtained via standard AR 
model computations. It is straightforward to check that this exact formula 
agrees with equation (28) up to o{w) terms in the small- w limit. 
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